Update ARM CPU experimental kernels from AO to leverage pip install #1458

metascroy · 2025-01-15T21:14:19Z

torchao experimental CPU kernels are now installed and loaded automatically by pip.
Switch quantization to use new quantize_ API

pytorch-bot · 2025-01-15T21:14:22Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchchat/1458

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit 8a9a644 with merge base e5cf6e5 ():

NEW FAILURE - The following job has failed:

pull / test-cpu-eval-sanity-check-float32 (aarch64, stories15M) (gh)
Process completed with exit code 1.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Jack-Khuu

Thanks for making the pip install work with the subclass APIs!!

docs/quantization.md

torchchat/utils/quantize.py

Jack-Khuu · 2025-01-15T21:30:12Z

cc: @manuelcandales

Can have please for MPS? 🥺🥺 (Separate PR)

Jack-Khuu · 2025-01-18T00:26:07Z

Awaiting pytorch/executorch#7759

metascroy · 2025-01-28T21:48:35Z

@Jack-Khuu did you update the version AO uses in ET?

Jack-Khuu · 2025-01-29T01:32:41Z

Yup pytorch/executorch@9836b39 Points to pytorch/ao@11333ba

Co-authored-by: Jack-Khuu <[email protected]>

nikhil-arm · 2025-02-20T12:49:19Z

Hello @metascroy @Jack-Khuu , what is the plan to get this in mainline? We would like to use KleidiAI kernels from aten via this quantizer path. Let us know if we need to raise a new PR ?

Jack-Khuu · 2025-02-20T18:56:52Z

Hi @nikhil-arm, we're still planning to land this

Can you share the specific commit hashes y'all need?

Jack-Khuu · 2025-02-26T21:46:08Z

@nikhil-arm We've bumped the AO pin on main.
Please let me know if you there's any additional support needed to unblock KleidiAI kernels

install/install_requirements.sh

Jack-Khuu · 2025-02-27T01:53:58Z

After a suite of rebases, pinbumps, and splitting up tests up we know what we're tackling:

test-torchao-experimental-cpp (macos-14-xlarge): Tests the AOTI runner and likely failing (also in main) due to not linking to the LibOMP from torch as @malfet mentioned in Bump PT 2025131 and ET pins 20250209 #1493.
test-torchao-experimental-et (macos-14-xlarge): Tests the ET runner; looks like a install bug where USE_CPP isn't set, but will likely run into the same LibOMP issue as above

metascroy · 2025-03-05T23:42:50Z

Hello @metascroy @Jack-Khuu , what is the plan to get this in mainline? We would like to use KleidiAI kernels from aten via this quantizer path. Let us know if we need to raise a new PR ?

Sorry about the delay @nikhil-arm.

@Jack-Khuu let's try to get this landed within the next week. Bumping the ao pin in torchchat had various conflicts with the CI, but I think we can dedicate to making this work.

I think it does make sense to first land pytorch/ao#1836 in torchao before bumping because they've already deprecated the old quantizers in quantize_.

Jack-Khuu

Re-review and things are looking great

Thanks again

Jack-Khuu · 2025-03-12T00:02:15Z

Hi @nikhil-arm The experimental kernels from AO are now in. Can you share a link to the KleidiAI Kernels (either here or a GH issue if you can share more context)?

metascroy · 2025-03-12T00:22:55Z

Hi @nikhil-arm The experimental kernels from AO are now in. Can you share a link to the KleidiAI Kernels (either here or a GH issue if you can share more context)?

For context for both @nikhil-arm and @Jack-Khuu. There are two locations of KleidiAI kernels in PyTorch/torchao now.

PyTorch has a 4-bit quantized linear op backed by KleidiAI. I think @nikhil-arm wants to enable this in torchchat. To do this, you need to pass "aten" to the target in PackedLinearInt8DynamicActivationIntxWeightLayout(target="aten") here: https://github.com/pytorch/torchchat/blob/main/torchchat/utils/quantize.py#L150. I will leave it to @nikhil-arm to put up a PR to pipe this through. This should then work with most surfaces, except ExecuTorch, although I don't know if it's been tested in anything other than eager mode.

We have also pulled in KleidiAI kernels in torchao itself. These should work on all surfaces, including ExecuTorch. To enable these, we need to add the flag TORCHAO_BUILD_KLEIDIAI=1 before the pip install:

torchchat/install/install_torchao.sh

Line 38 in 8a9a644

    
           USE_CPP=1 $PIP_EXECUTABLE install git+https://github.com/pytorch/ao.git@${TORCHAO_PIN}

When enabled, PackedLinearInt8DynamicActivationIntxWeightLayout() will use the KleidiAI kernel when supported (4-quantization, has_weight_zeros=false), and use our native torchao kernels in other cases. You can see what kernel is being selected at runtime by setting the environment variable TORCH_CPP_LOG_LEVEL=Info. Eventually we want to enable this flag by default, but I have it disabled right now for two reasons: 1) the flag increases the install time because it clones/builds KleidiAI; 2) there is limited perf benefit to enabling it now based on the KleidiAI kernels we have enabled in torchao (GEMV neondot kernels) vs. the native torchao Arm kernels (also GEMV neondot kernels).

metascroy requested a review from Jack-Khuu January 15, 2025 21:14

facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jan 15, 2025

Jack-Khuu added the Quantization Issues related to Quantization or torchao label Jan 15, 2025

Jack-Khuu approved these changes Jan 15, 2025

View reviewed changes

docs/quantization.md Outdated Show resolved Hide resolved

torchchat/utils/quantize.py Outdated Show resolved Hide resolved

torchchat/utils/quantize.py Show resolved Hide resolved

torchchat/utils/quantize.py Outdated Show resolved Hide resolved

Jack-Khuu changed the title ~~update experimental kernels in torchchat~~ Update ARM CPU experimental kernels from AO to leverage pip install Jan 15, 2025

metascroy and others added 6 commits January 29, 2025 17:15

update experimental kernels in torchchat

bdac616

Update docs/quantization.md

74363e4

Co-authored-by: Jack-Khuu <[email protected]>

Update torchchat/utils/quantize.py

48f568d

Co-authored-by: Jack-Khuu <[email protected]>

Update torchchat/utils/quantize.py

525701d

Co-authored-by: Jack-Khuu <[email protected]>

Fixing import typo in quantize.py

f9a7bb9

Bump ET pin to pick up AO changes

0abe175

metascroy force-pushed the new-intx-quantizer branch from 8ebf63f to 0abe175 Compare January 30, 2025 01:15

Merge branch 'main' into new-intx-quantizer

95304b8

Jack-Khuu mentioned this pull request Feb 11, 2025

Bump PT 2025131 and ET pins 20250209 #1493

Merged

Jack-Khuu added 2 commits February 11, 2025 11:32

Bump torchao-pin to match ET and torchchat

76e8ec5

Merge branch 'main' into new-intx-quantizer

c2108d6

Jack-Khuu added 2 commits February 24, 2025 11:45

Merge branch 'main' into new-intx-quantizer

4ee1b96

Merge branch 'main' into new-intx-quantizer

61a1c62

metascroy commented Feb 27, 2025

View reviewed changes

install/install_requirements.sh Outdated Show resolved Hide resolved

Jack-Khuu added 2 commits February 26, 2025 16:13

Update torchao-pin.txt

3e04645

Split up AOTI and ET tests

94fcd9a

Jack-Khuu added 3 commits February 26, 2025 17:55

Bump ET pin to 2-26-25 with new AO pin

7e56c55

Undo et pin bump; fails basic install

77e8a62

Merge branch 'main' into new-intx-quantizer

67dd729

metascroy added 12 commits March 10, 2025 18:42

update

94ad51a

up

34cb931

up

b564fc1

up

9eed5d1

up

14365c4

up

66d90e1

up

12cbd13

up

28d1a99

up

d2cc25a

up

d79f870

up

a8106fd

up

aa6fb70

Jack-Khuu approved these changes Mar 11, 2025

View reviewed changes

up

8a9a644

Jack-Khuu merged commit 3c7e839 into main Mar 11, 2025
72 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update ARM CPU experimental kernels from AO to leverage pip install #1458

Update ARM CPU experimental kernels from AO to leverage pip install #1458

metascroy commented Jan 15, 2025

pytorch-bot bot commented Jan 15, 2025 •

edited

Loading

Jack-Khuu left a comment

Jack-Khuu commented Jan 15, 2025

Jack-Khuu commented Jan 18, 2025

metascroy commented Jan 28, 2025

Jack-Khuu commented Jan 29, 2025

nikhil-arm commented Feb 20, 2025

Jack-Khuu commented Feb 20, 2025

Jack-Khuu commented Feb 26, 2025

Jack-Khuu commented Feb 27, 2025

metascroy commented Mar 5, 2025

Jack-Khuu left a comment

Jack-Khuu commented Mar 12, 2025

metascroy commented Mar 12, 2025

Update ARM CPU experimental kernels from AO to leverage pip install #1458

Update ARM CPU experimental kernels from AO to leverage pip install #1458

Conversation

metascroy commented Jan 15, 2025

pytorch-bot bot commented Jan 15, 2025 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchchat/1458

❌ 1 New Failure

Jack-Khuu left a comment

Choose a reason for hiding this comment

Jack-Khuu commented Jan 15, 2025

Jack-Khuu commented Jan 18, 2025

metascroy commented Jan 28, 2025

Jack-Khuu commented Jan 29, 2025

nikhil-arm commented Feb 20, 2025

Jack-Khuu commented Feb 20, 2025

Jack-Khuu commented Feb 26, 2025

Jack-Khuu commented Feb 27, 2025

metascroy commented Mar 5, 2025

Jack-Khuu left a comment

Choose a reason for hiding this comment

Jack-Khuu commented Mar 12, 2025

metascroy commented Mar 12, 2025

pytorch-bot bot commented Jan 15, 2025 •

edited

Loading